Spatial Ecology and Macroecology

Practical - Week 4

Gabriele Midolo & Carmen Soria (today replaced by Melanie Tietje)

(Department of Spatial Sciences)

2025-10-20

What are we going to see today?

A brief introduction to Species Distribution Modelling (SDM)

  1. What are SDMs?
  2. SDM approaches
  3. SDM workflow
  4. SDM limitations
  5. Resources
  6. SDM practical example

What are Species Distribution Models?

SDM overview

  • Tool that aims to predict where species could potentially be located from a limited set of observations.

  • It can also used to estimate a species’ niche from its distribution.

  • It’s a huge and popular field, mainly used in quantitative ecology and conservation.

SDM overview

  • There are multiple software packages and tutorials, facilitating its implementation.

  • Examples: sdm, dismo, usdm, ecospat, biomod2, etc.

  • It’s also a recent field, that quickly changes and advances, and it’s full of problems and challenges.

SDM overview

What data do SDMs require?

  1. Georeferenced biodiversity observations
    • Individual locations, species presences, richness, etc.
    • Sources: GBIF, OBIS, Movebank, etc. (Practical 1)
  2. Geographic layers of environmental data (predictor or independent variable)
    • Climate, land cover, human density, etc.
    • Sources: CHELSA, WorldClim, MODIS, etc. (Practical 3)

SDM overview

How do SDMs work?

SDMs relate the biodiversity observations to the environmental data using a variety of algorithms.

Once this relationship has been modeled, they can predict:

  • Future distributions

  • Areas where the species might be already

  • Areas suitable for (re-)introduction

  • Distribution of alien species

SDM approaches

Different SDM algorithms need different types of occurrence data. These are the three main approaches:

  1. Presence-only methods:
    1. Absence data is not necessary.
    2. They can only estimate habitat suitability, not the probability of occurrence.
    3. They are sensitive to spatial biases.
    4. Example:
      1. MaxEnt

SDM approaches

  1. Presence-absence methods:
    1. Use presence-absence data.
    2. Estimate true probability of occurrence.
    3. Sensitive to imperfect detection.
    4. Examples:
      1. Regression-based methods: Generalized Linear models (GLM), Generalized Additive Models (GAM).
      2. Machine-learning methods: Random forest (RT), Boosted Regression Trees (BRT).

SDM approaches

  1. Presence-pseudo absence methods:
    1. Generate artificial absences to allow using presence-absence methods.
      1. The Barbet-Massin et al. (2012) paper is a good starting point on how to generate them.
    2. Allows using modelling approaches that require presence-absence data.
    3. Example: GLM, Random forest (and other presence-absence methods).

SDM approaches

  1. Ensemble methods:

    • If we perform many models, which one do we choose?

    • We could choose the one that’s best suited to our data, or that performs the best.

    • However, a more popular approach is performing ensemble models.

    • In this approach, predictions from multiple models are combined or averaged to produce a single model.

    • The most frequently used package is biomod2.

SDM workflow

ODMAP protocol (Zurell et al. 2020)

SDM workflow - Overview / Conceptualization

Which are our research objectives?

Which taxa are we working with? Where is our location? And our scale?

What data is available?

SDM workflow - Data preparation

Obtain:

  • Biodiversity data (generally point observations)
  • Environmental data (generally rasters)

What kind of biodiversity data do we have?

  1. Presence only
  2. Presence / absence

SDM workflow - Data preparation

Ensure the temporal and spatial scale of the biodiversity and environmental data match

Clean biodiversity data of unreliable observations - centroids - outliers - duplicates - institutions

SDM workflow - Data preparation

Generate pseudo-absences (if necessary)

SDM workflow - Data preparation

Remove collinear environmental variables

12 variables from the 19 input variables have collinearity problem: 
 
wc2.1_30s_bio_16 wc2.1_30s_bio_17 wc2.1_30s_bio_19 wc2.1_30s_bio_18 wc2.1_30s_bio_6 wc2.1_30s_bio_4 wc2.1_30s_bio_1 wc2.1_30s_bio_12 wc2.1_30s_bio_10 wc2.1_30s_bio_7 wc2.1_30s_bio_5 wc2.1_30s_bio_15 

After excluding the collinear variables, the linear correlation coefficients ranges between: 
min correlation ( wc2.1_30s_bio_9 ~ wc2.1_30s_bio_8 ):  0.04396524 
max correlation ( wc2.1_30s_bio_13 ~ wc2.1_30s_bio_2 ):  -0.6214331 

---------- VIFs of the remained variables -------- 
         Variables      VIF
1  wc2.1_30s_bio_2 3.850564
2  wc2.1_30s_bio_3 2.702829
3  wc2.1_30s_bio_8 2.756529
4  wc2.1_30s_bio_9 2.297982
5 wc2.1_30s_bio_11 6.555181
6 wc2.1_30s_bio_13 2.535430
7 wc2.1_30s_bio_14 2.942497

SDM workflow - Data preparation

Remove collinear environmental variables

SDM workflow - Data preparation

Spatial thinning - Keep only one presence / absence per environmental raster cell

Separate data into training (modelling) and testing

$initial
[1] 3024

$kept
[1] 1136

$out
[1] 1888

SDM workflow - Model fitting

Model selection:

  • Single model? Which?

  • Ensemble? How to average over models?

Which model settings should we use? % testing vs % training, number of cross validation folds (separations), number of repetitions per fold (robustness vs. computational time)

If we want to produce binary predictions, which threshold should we use?

It all depends on our data and our objectives

SDM workflow - Model assessment

Exploration of response curves

Assessment of model coefficients and variable importance

Do our results make sense?

SDM workflow - Model fitting

Model performance metrics:

  • AUC: area under the receiver operating characteristic curve (closer to 1 better, but beware! very high values might indicate overfitting)

  • TSS: true skill statistic

  • Sensitivity: true positive rate

  • Specificity: true negative rate

You want to balance the correctly predicted presences and correctly predicted absences, depending on what your goal is

SDM workflow - Predictions

Map the potential distribution obtained from the modelling phase.

Into different space and time.

SDM limitations

Underlying assumptions:

  • Species are at equilibrium with the environment

  • Species and environment are well sampled

  • We are considering all primary factors determining species distributions

The observation process can also bias our results

Resources: Tutorials

Resources: Tutorials

Let’s code an SDM!